Domain-Specific Cross Language Retrieval: Comparing and Merging Structured and Unstructured Indices

نویسندگان

  • Jens Kürsten
  • Maximilian Eibl
چکیده

This year, we participated in all Monolingual, Bilingual and Multilingual tasks of the DomainSpecific track. We used a redesigned version of our retrieval system prototype from 2006, which is based on the Lucene API [1]. A plugin to access the online translation services Google Translate [2] and PROMT [3] was implemented for the cross-language experiments. Furthermore, we tried to figure out the differences between plain and structured indices and also applied a data fusion approach for both index schemes. In comparison to the median of all participants of the Monolingual tasks we achieved average performance for our german and english and strong performance for our russian runs. The results of the cross-language tasks were robust compared to our own monolingual experiments and better than the average of the results submitted by all participants.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Study of Degree of Bilingualism in Lexical Retrieval and Language Learning Strategies

This study compares lexical retrieval amongst monolinguals and intermediate bilinguals and advanced bilinguals. It also investigates the possible effects of their language learning strategies on their respective lexical retrieval advantage. The study used a mixed methods design and the groups consisted of 20 Persian near-monolinguals, 20 Persian-English intermediate level bilinguals, and 20 Per...

متن کامل

Domain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Anaalysis

The domain-specific track aims at monoand cross-language information retrieval on structured scientific data. This track studies retrieval in a domain-specific context using two social science databases: The German Indexing and Retrieval Testdatabase (GIRT) (forth version GIRT-4: German/English pseudo-parallel corpus with identical documents) with 302,638 documents in total, and the Russian Soc...

متن کامل

Multilingual Terminology Extraction and Validation

This paper presents the automatic terminology extraction approach developed within project LIQUID. This project aims at developing a cost-effective solution for the problem of cross-language access to multilingual text databases in technical and scientific domains. Cross-Language Information Retrieval faces a major challenge: organizing unstructured textual information according to its contents...

متن کامل

Retrieval of Legal Documents: Combining Structured and Unstructured Information

Legal information is often accessible via portal web sites. Legal documents typically combine structured and unstructured information, the former being tagged with markup languages such as XML (Extensible Markup Language). Current information retrieval research takes into account the structured information content of documents when computing the relevance ranking. Such an approach is very promi...

متن کامل

Notes on Experiments with Pseudo Relevance Feedback and Data Merging In Cross-Language Retrieval

In the TREC-8 cross-language information retrieval (CLIR) track, we adopted the approach of using machine translation to prepare a source-language query for use in a target-language retrieval task. We empirically evaluated (1) the effect of pseudo relevance feedback on retrieval performance with two feedback vector length control methods in CLIR, and (2) the effect of multilingual data merging ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007